Search CORE

134 research outputs found

Forbidden Directed Minors and Kelly-width

Author: Kintali Shiva
Zhang Qiuyi
Publication venue
Publication date: 11/01/2014
Field of study

Partial 1-trees are undirected graphs of treewidth at most one. Similarly, partial 1-DAGs are directed graphs of KellyWidth at most two. It is well-known that an undirected graph is a partial 1-tree if and only if it has no K_3 minor. In this paper, we generalize this characterization to partial 1-DAGs. We show that partial 1-DAGs are characterized by three forbidden directed minors, K_3, N_4 and M_5

arXiv.org e-Print Archive

CiteSeerX

Optimal Scalarizations for Sublinear Hypervolume Regret

Author: Zhang Qiuyi
Publication venue
Publication date: 06/07/2023
Field of study

Scalarization is a general technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, such as recently in RLHF for training reward models that align human preferences. Yet some have dismissed this classical approach because linear scalarizations are known to miss concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that can explore a diverse set of

k

objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret, achieving an optimal sublinear regret bound of

O(T^{-1/k})

, with matching lower bounds that preclude any algorithm from doing better asymptotically. As a theoretical case study, we consider the multiobjective stochastic linear bandits problem and demonstrate that by exploiting the sublinear regret bounds of the hypervolume scalarizations, we can derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of

\tilde{O}( d T^{-1/2} + T^{-1/k})

. We support our theory with strong empirical performance of using simple hypervolume scalarizations that consistently outperforms both the linear and Chebyshev scalarizations, as well as standard multiobjective algorithms in bayesian optimization, such as EHVI.Comment: ICML 2023 Worksho

arXiv.org e-Print Archive

Loan Loss Provisioning in Chinese Commercial Banks

Author: Zhang Qiuyi
Publication venue
Publication date
Field of study

The object of this paper is to jointly test the existence of income smoothing, capital management behavior and procyclicality with respect to Chinese commercial banks' provisioning during 2009-2014. Our results provide evidence for income smoothing behavior and procyclical provisioning, but we find no evidence for capital management behavior. In order to address the problems of income smoothing and procyclical provisioning, we give the following suggestions. Bank regulators should further promote the implementation of Basel III regimes among Chinese commercial banks. The dynamic provisioning practice should be promoted and the accounting disclosure requirements should be improved. Besides, bank regulators should require banks to write off uncollectible loans in time and should also strengthen the supervision and scrutiny of banks' activities

Nottingham ePrints

Optimal Query Complexities for Dynamic Trace Estimation

Author: Woodruff David P.
Zhang Fred
Zhang Qiuyi
Publication venue
Publication date: 30/09/2022
Field of study

We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly, such as during an optimization process. Specifically, for any

m

matrices

A_1,...,A_m

with consecutive differences bounded in Schatten-

1

norm by

\alpha

, we provide a novel binary tree summation procedure that simultaneously estimates all

m

traces up to

\epsilon

error with

\delta

failure probability with an optimal query complexity of

\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right)

, improving the dependence on both

\alpha

and

\delta

from Dharangutte and Musco (NeurIPS, 2021). Our procedure works without additional norm bounds on

A_i

and can be generalized to a bound for the

p

-th Schatten norm for

p \in [1,2]

, giving a complexity of

\widetilde{O}\left(m \alpha\left(\sqrt{\log(1/\delta)}/\epsilon\right)^p +m \log(1/\delta)\right)

. By using novel reductions to communication complexity and information-theoretic analyses of Gaussian matrices, we provide matching lower bounds for static and dynamic trace estimation in all relevant parameters, including the failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation, resolving open questions of prior work.Comment: 30 page

arXiv.org e-Print Archive

New Absolute Fast Converging Phylogeny Estimation Methods with Improved Scalability and Accuracy

Author: Rao Satish
Warnow Tandy
Zhang Qiuyi (Richard)
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Publication date: 01/01/2018
Field of study

Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch lengths are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was DCM_NJ, published in SODA 2001. The main empirical advantage of DCM_NJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCM_NJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCM_NJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other AFC methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees)

Dagstuhl Research Online Publication Server

Convergence Results for Neural Networks via Electrodynamics

Author: Panigrahy Rina
Rahimi Ali
Sachdeva Sushant
Zhang Qiuyi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)
Publication date: 01/01/2018
Field of study

We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradient descent converges to the target function is equivalent to the following question in electrodynamics: Given k fixed protons in R^d, and k electrons, each moving due to the attractive force from the protons and repulsive force from the remaining electrons, whether at equilibrium all the electrons will be matched up with the protons, up to a permutation. Under the standard electrical force, this follows from the classic Earnshaw\u27s theorem. In our setting, the force is determined by the activation function and the input distribution. Building on this equivalence, we prove the existence of an activation function such that gradient descent learns at least one of the hidden nodes in the target network. Iterating, we show that gradient descent can be used to learn the entire network one node at a time

Dagstuhl Research Online Publication Server